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ABSTRACT 


J 

Over  the  past  several  years  benchmarking  has  been  devel¬ 
oped  into  an  effective  technique  for  performance  analyses  of 
computer  systems.  Relational  database  machines  are  rela¬ 
tively  new  compter  systems  for  which  a  benchmarking  tech¬ 
nique  does  not  yet  9xist. 

The  benchmarking  of  relational  database  machines 
involves  the  indent ific at ion  and  design  of  test  programs 
through  which  relevant  performance  data  can  be  gathered  and 
interpreted.  All  features  of  relational  database  management 
must  be  considered  when  designing  these  test  programs.  The 
join  operations  are  an  important  feature  of  relational  data¬ 
base  management. 

The  test  programs  for  the  join  operations  necessarily 
include  the  repetition  of  certain  queries  during  which 
specific  join  parameters  are  varied.  These  parameters 
include:  tuple  size,  relation  size,  disk  placement,  and  the 
use  cf  indices.  A  number  of  join  operations  have  been 
benchmarked.  These  operations  are  equality  joins, 
inequality  joins,  three-way  joins,  and  virtual  joins  (i.e. , 
views).  In  addition,  a  number  of  relational  database 
machine  configurations  have  been  utilized  for  benchmarking 
the  join  operations. 

The  highlights  of  the  thesis  can  be  found  in  its  contri¬ 
bution  he  a  benchmarking  technique  for  the  join  operations 
and  its  conclusions  on  the  performance  analyses  of  various 
relational  machines  in  operating  joins. 
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I.  INTRODUCTION 


A.  HEAT  IS  BENCHMARKING? 

The  term  "benchmark”  has  its  origin  in  the  field  of 
geographical  surveying.  A  benchmark  is  a  permanent 
geographic  feature  which  serves  as  a  landmark  for  surveying. 
The  term  has  evolved  into  defining  a  standard  or  criterion 
associated  with  a  particular  type  of  system  or  product. 
This  standard  serves  as  a  point  of  reference  to  which  func¬ 
tionally  similar  systems  or  products  can  be  compared. 

In  the  realm  of  computer  science  a  benchmark  consists  of 
a  standard  set  of  instructions  or  programs.  The  execution  of 
the  set  cn  one  system  provides  measurements  that  can  be  used 
to  compare  with  measurements  obtained  by  running  the  same 
set  on  another  system.  This  is  the  essence  of  computer 
system  benchmarking:  the  process  of  conducting  controlled 
experiments  to  collect  indicators  of  comparative  performance 
of  different  computer  systems. 

B.  THE  "GIBSON  MIX" 

Comparisons  of  computer  systems  were  prompted  by  the 
increasing  application  of  the  systems  in  business  and  other 
situations  in  a  cost-effective  way.  This  interest  in  compa¬ 
rative  performance  of  systems  had  resulted  in  the  controlled 
experiments  of  the  systems.  In  1970,  J.C.  Gibson  introduced 
a  system  of  programs  sets  or  "mixes"  by  which  variable  types 
of  workloads  could  be  compared.  The  "Gibson  Mix"  approach 
to  comparing  systems  is  based  on  testing  several  se+s  of 
applications  in  both  business  and  science.  The  results, 
execution  times,  cf  these  tests  were  published.  The 
problem  of  selecting  a  particular  computer  system  could  be 
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reduced  tc  establishing  workloads  as  multiples  of  the  mix. 
Ey  properly  balancing  execution  tames  and  mix  multiples, 
system  evaluators  could  produce  comparative  estimates  for 
total  computer  systems. 

C.  BENCHHABK  DESIGN  AND  OBJECTIVES 

Benchmarking  as  a  technique  for  comparisons  of  computer 
performance  has  enjoyed  increasing  popularity  over  the  past 
decade.  This  approach  is  appealing  both  to  producers  and 
consumers.  Basic  guidelines  have  been  developed  for  the 
proper  use  of  benchmarks.  The  benchmark  must  be  representa¬ 
tive  cf  real-world  workloads,  and  the  mix  of  instructions 
should  be  inclusive  enough  to  provide  as  much  relevant  data 
as  possible.  Additionally,  the  relevance  of  benchmark 
content  must  be  justifiable.  The  benchmark  should  be  care¬ 
fully  designed,  and  cbjectives  should  be  specifically  stated 
so  that  the  proper  sequence  of  steps  in  the  benchmark 
progression  can  be  set  down.  Objectives  may  include  evalua¬ 
tion  towards  procurement,  design  analysis,  component  certi¬ 
fication,  quality  determinations,  load  analysis,  improvement 
of  performance,  or  ether  objectives  as  determined  by  these 
requiring  the  benchmark.  The  benchmark  should  be  tailored 
to  the  objective  and  deal  with  those  demands  or  applications 
which  initially  formed  the  basis  of  and  the  requirement  for 
the  benchmarking.  The  benchmark  must  be  controlled  from 
design  through  implementation  and  throughout  the  interpreta¬ 
tion  of  the  results. 
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II.  THE  BENCHMARKING  ENVIRONMENT 


The  experiments  described  in  this  paper  have  been 
conducted  on  several  configurations  of  an  SDH  1100  at  the 
Data  Processing  Service  Center  West,  Naval  Air  Station, 
Point  Mugu ,  California.  The  SDH  1100  and  its  various 
configurations  are  relational  database  machines,  each  of 
which  is  designed  to  be  the  backend  of  UNIVAC  1100  series 
computers. 

A.  THE  HCST  COMPOTES 

The  host  computer  system  of  which  a  relational  database 
machine  is  used  as  the  backend  is  the  UNIVAC  l100/«2.  No 
modifications  have  been  required  of  the  UNIVAC  operating 
system.  Specially  designed  host-resident  software  has  been 
installed  in  the  UNIVAC. 

B.  THE  HCST  COMPOTEH/DATAB  ASE  MACHINE  INTERFACE 

Figure  2.1  depicts  the  presently  available  methods  for 
interfacing  between  the  host  and  the  backend.  The  first 
method,  the  relational  query  language,  is  a  command  inter¬ 
face.  The  second  method  allows  the  user  to  execute  a  series 
of  queries  fcy  referring  to  a  set  of  stored  commands.  The 
third  method  is  via  user  programs  written  in  high-level 
programming  languages  such  as  COBOL  and  FORTRAN  in  which  a 
subroutine  is  provided  for  accessing  data  stored  in  the 
backend  machine. 

In  the  RDM,  host  interfacing  is  accomplished  by  both 
parallel  and  serial  interface  modules  (processors)  (see 
Figure  2.2).  Each  interface  module  can  support  up  to  8  host 
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C.  THE  EENCHH  ABKED  RELATIONAL  DATABASE  MACHINE 


The  basic  relational  database  machine  on  which  the 
benchmarking  experiments  have  been  conducted  is  a  modularly 
designed,  microprocessor- based  database  computer.  The 
modules  are  organized  around  a  single  high-speed  bus  (see 
Figure  2.2  again). 

1  •  Technology  and  Functionality  of  Modules 

a.  The  Database  Processor 

This-Z8000  series  microprocessor  controls  the 
flow  cf  data  by  translating  user  queries  into  procedures. 
Additionally,  this  processor  supervises  system  resources, 
coordinates  hardware  monitoring,  and  performs  bus  arbitra¬ 
tion.  The  processor  contains  approximately  99%  of  C-codes 
and  operates  at  1/2  MIP.  If  the  database  accelerator 
(described  below)  is  available,  the  database  processor 
senses  its  availability  and  issues  calls  for  its  services. 

b.  The  Accelerator 

This  high-speed,  auxilary  processor  which 
executes  instructions  at  1 0  MIPS  is  built  from  ECL  logic. 
It  has  a  three-stage  pipeline  and  is  designed  to  optimize  a 
well-defined  collection  cf  often  used  database  management 
subroutines.  The  accelerator  can  filter  data  at  disk 
transfer  rates. 

c.  The  Cache 

This  main  memory  is  composed  of  64K  dynamic  ram 
chips  and  is  expandable  up  to  6  megabytes.  System  informa¬ 
tion  and  code  occupy  approximately  360K  of  this  memory. 
Cache  is  allocated  in  2K  blocks,  contiguously  whenever 
possible.  The  paging  algorithm  is  basically 

laast-Recently-Osed ,  and  the  system  code  is  never  paged  out. 
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a.  Disk  Drives  and  the  Secondary  Storage 


The  disk  controller  module  performs  burst  error 
detection  and  correction  and  retry  without  intervention  by 
the  database  processor.  This  controller  can  manage  from  one 
to  four  disk  drives  with  each  drive  having  a  capacity  of  one 
to  four  disks.  Presently  there  are  two  disks  available  with 
each  disk  capable  of  storing  approximately  600  megabytes  of 
data . 

2.  Different  Accel  era  tor  and  Cache  Configurations 

Tested 

The  benchmarking  experiments  have  been  conducted  on 
the  following  different  machine  configurations; 

a.  1/2-megabyte  cache  without  the  database 
accelerator 

fc.  2-megabytes  cache  with  the  database  accelerator 

c.  2-megabytes  cache  without  the  database 

accelerator 

D.  THE  DATABASES 

The  relational  database  machine  handles  data  in  2K  byte 
blocks.  Kith  this  in  mind  a  syntesized  database  has  been 
designed.  Tuple  (record)  lengths  of  100  bytes,  200  bytes, 
1000  bytes,  and  2000  bytes  have  been  chosen,  thereby 
providing  a  range  of  1  to  20  tuples  per  block.  It  has  been 
sought  through  experimentation  to  contrast  the  same  opera¬ 
tions  performed  on  relations  with  different  numbers  of 
tuples  per  block.  It  is  felt  that  this  approach  may  provide 
some  measure  of  processor-overhead  time  versus  I/O  time. 


1 .  Database  sen eration 


Standard  templates  fcr  each  of  the  four  different 
tuple  lengths  have  been  designed.  Table  I  describes  these 
templates.  Note  that  within  each  template  there  are  attri¬ 
butes  (fields)  that  are  common  to  all  four  templa+es: 
sequential  integers  and  random  integers.  The  attributes  of 
sequential  and  random  integers  can  be  used  to  enforce 
different  orderings  of  the  same  data.  Each  template  also 
contains  attributes  specified  with  values  uniformly  distri¬ 
buted  over  a  number  of  enumerated  values.  By  ensuring 
specified  distribution,  the  reliability  of  equality  joins 
can  be  assured. 

The  actual  relations  for  the  experimental  databases 
have  teen  generated  cn  an  IBM  3033  system  in  batch  mode  and 
have  been  transferred  to  tape  for  transport  to  the  UNIVAC 
system.  Fcr  each  of  the  four  tuple  lengths,  relations  have 
been  generated  with  500,  1000,  2500,  5000,  and  10000  tuples. 

2.  Databas e  Creation.  Loading,  and  Disk  Placement 

In  the  environment  cf  the  database  machine,  the 
number  cf  2K-byte  blocks  assigned  to  a  database-  is  specified 
with  the  CREATE  DATABASE  command  in  the  query  language 
(Section  II. E  further  describes  the  query  language)  .  Since 
database  allocations  are  made  in  the  whole  number  of  cylin¬ 
ders,  the  number  of  blocks  specified  will  be  rounded  up  to 
the  first  whole  number  of  cylinders.  Once  the  allocation  is 
made,  the  number  of  blocks  actually  allocated  is  returned  to 
the  user.  The  syntax  for  database  creation  in  the  query 
language  is: 


CREATE  DATABASE  (name)  HITH  (options) 


TABLE  I 

Standard  Tuple  Templates 
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1  .  RELA 1  ION 
(FIELD  TYPE 
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options : 

demand  -  the  number  of  blocks  to  allocate 
disk  -  the  disk  on  which  allocation  is  desired 


For  example, 

CREATE  DATABASE  NPS1EST  with  demand  =  1000  on  "DSK001", 
demand  =  2000  on  "DSKSY3" 

would  set  aside  1000  blocks  cn  the  disk  named  "DSK001"  and 
2000  blccks  on  the  disk  named  "DSKSYS"  for  the  database 
"NPSTEST". 


Once  the  database  has  been  assigned  disk  space, 
relations  in  that  database  may  be  created  as  follows: 


CREATE  relation -name  {{field  name)  =  (format)  ,•••, 

(field  name)  =  (format)) 

The  above  command  would  set  up  an  empty  relation  in  the 
database  to  which  tuples  could  then  be  appended.  A  database 
is  opened  by  simplying  entering:  "OPEN  (database  name)". 

In  order  to  bulk  load  records  info  relations  in 
specified  databases,  utility  programs  have  been  provided. 
The  experimental  relations  that  have  been  generated  on  the 
IBM  3033  system  and  subsequently  loaded  into  the  ONIVAC 
system  have  been  translated  into  the  backend  machine  using 
these  utility  programs. 

Initially,  we  have  attempted  to  manipulate  the 
placement  of  relations  in  a  database.  That  is,  once  a  data* 
base  has  been  allocated  with  disk  space  by  the  CREATE 
command,  w«  have  tried  to  force  a  specific  placement  of  a 


19 


relation  on  a  particular  disk.  Wa  have  assumed  that  for  a 
join,  optimization  can  be  achieved  if  the  relations  tc  be 
joined  are  physically  located  on  different  disks.  However, 
cur  attempts  at  placement  have  proven  futile.  The  designers 
of  the  backend  machine  utilized  certain  placement  algo¬ 
rithms.  These  algorithms  are  proprietary  and  are,  there¬ 
fore,  unavailable  for  our  modification. 

The  query  language  for  the  machine  allows  the  crea¬ 
tion  of  indices  for  quicker  data  access.  The  creation  of 
these  indices  and  their  use  is  described  in  the  following 
section. 

3 .  Indices 

Simply  stated,  indices  are  designed  to  provide  more 
direct  access  to  stored  data.  The  query  language  for  the 
relational  database  machine  allows  for  the  creation  of  two 
different  types  of  indices.  A  "clustered"  index  is  one  for 
which  the  tuple  is  physically  in  the  order  of  the  value  in 
the  specified  field.  A  "nonclustered"  index  is  one  that  is 
created  for  a  field  or  group  of  fields  for  which  the  tuple 
is  not  clustered. 

Note  that  in  NPSTEST  all  of  the  relations  have  been 
created  with  clustered  indices.  Also,  as  they  are  described 
below,  indices  for  certain  relations  in  other  experimental 
databases  may  be  created,  destroyed,  and  then  recreated 
during  the  course  of  the  run  stream  for  a  particular  join 
experiment. 

4  .  The  Ex  peri  me  ntal  Databases 

Table  II  describes  the  experimental  databases.  As 
they  are  explained  more  fully  below  in  individual  experiment 
descriptions,  the  size  of  the  databases,  the  number  of  rela¬ 
tions  in  the  databases,  and  the  indices  employed  are  all 
factors  in  the  measurements  obtained. 
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TiBLE  II 
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E.  THE  QOEBY  LANGUAGE  FOB  THE  DATABASE  MACHINE 

Incorporated  as  an  integral  part  of  the  relational  data¬ 
base  system  is  the  query  language  (SQL  in  the  case  of  the 
RDM  1100).  This  query  language  is  designed  to  be  both  a 
definition  language  and  manipulation  language  for  the  data 
stored  in  the  machine. 

1 .  Semantics  and  Syntax 

The  use  cf  the  CREATE  command  for  both  databases 
and  relations  has  previously  been  discussed.  The  following 
discussion  seeks  to  describe  these  features  of  the  query 
language  that  are  essential  to  an  understanding  of  the 
nature  of  experiments  that  have  been  conducted  on  the  join 
operation. 

a.  BEGIN  (transaction  name) 

This  command  is  used  whenever  multiple  RQL  commands  are  to 
be  treated  as  a  single  command. 

b.  END  (transaction  name) 

This  command  is  used  at  the  end  of  the  group  of  RQL  commands 
under  BEGIN. 

c.  CREATE  VIEW  (view  name) 

This  command  is  used  to  sat  up  a  virtual  relation  within  a 
database. 

d.  DEFINE  (stored  command  name) 

This  command  is  used  to  define  a  stored  command  for  a  parti¬ 
cular  database.  The  command  so  defined  can  be  referenced 
simply  by  its  name. 
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e.  DESTROY  (object  name) 

This  command  is  used  to  eliminate  databases,  relations, 
indices,  views,  stored  commands,  or  other  constructs  from 
the  system. 

f.  RANGE  of  (range  variable)  is  (relation  name) 

Range  variables  are  used  to  allow  the  user  tc  establish  a 
synonym  for  a  relation  name.  Once  this  synonym  is  estab¬ 
lished  it  can  be  used  in  lieu  of  the  relation  name. 

g.  RETRIEYE  (target  list)  WHERE  (qualification) 

This  is  the  most  essential  command  for  performing  join  oper¬ 
ations.  Relations  cr  portions  of  relations  are  pulled  from 
storage  and  displayed  for  the  user.  The  data  retrieved 
depends  cn  the  user  supplied  qualification  which  may  include 
singular  or  multiple  equalities  and  inequalities.  Op  to  22 
fields  can  be  specified  in  the  target  list. 

h.  RETRIEVE  (variable  name  *  GETTIME  () ) 

GETTIME  is  a  function  in  RQL  that  allows  the  user  to 
retrieve  a  time  statement  from  the  RDM  clock.  The  time 
integer  retrieved  is  in  1/60  seconds.  As  will  be  described 
below,  we  used  these  times  for  our  computations. 

2.  The  Experimental  Qu eries 

Queries  have  been  designed  utilizing  those  features 
of  RQL  described  above.  The  query  streams  have  been 
designed  as  sets  of  transactions,  and  the  joins  have  been 
designed  as  stored  commands  so  that  the  commands  could  be 
pre-parsed  in  order  that  parsing  time  would  be  eliminated 
from  the  join  time  measurements.  The  number  of  fields 
targeted  for  a  join  is  described  below  in  individual  experi¬ 
ment  descriptions. 
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III.  THE  BENCHMARKING 


A.  GCALS 

The  experiments  described  in  this  paper  are  directed 
towards  the  development  of  a  procedure  by  which  database 
machines  may  be  benchmarked.  The  efforts  described  here 
represent  cnly  a  pcrtion  of  the  research.  Interested 
readers  are  directed  to  [Ref.  1],  [Ref.  2],  and  [Ref.  3]  for 
additional  research  cn  selection  and  projection,  database 
administration,  and  database  generation. 

Ihe  goal  of  these  experiments  is  not  to  make  a  defini¬ 
tive  pronouncement  cn  the  performance  of  the  various 
configurations  of  the  RDM  1100.  Rather,  the  goal  is  tc 
learn  how  to  design  benchmarks  and  interpret  the  results  of 
the  benchmarking  experiments.  Towards  this  end,  the  method¬ 
ology  must  be  machine  independent,  and  the  wcrklcad  model 
must  be  based  on  a  mix  of  database  management  statements. 

B.  THE  HETHODOLOGT 

The  workload  has  beer,  modeled  as  a  collection  of  queries 
in  the  relational  query  language  (RQL)  .  The  primary  bench¬ 
mark  kernel  for  the  join  operations  is  the  RETRIEVE  state¬ 
ment  with  associated  qualifications.  In  designing  this 
workload,  classes  of  queries  have  been  identified.  These 
include  data-int ens ive  and  overhead-intensive  classes.  The 
workload  has  been  constructed  as  a  combination  of  queries 
from  each  class.  The  query  language  has  functioned  as  the 
primary  tocl  fcr  performance  measurement  since  neither 
sorftware  nor  hardware  probes  have  been  available  for  use  in 
conducting  these  experiments.  Using  the  functions  provided 
in  the  query  language,  elapsed  times  are  measureable  from 


the  database  machine  clock  in  seconds.  Since  the  goal  of 
these  experiments  is  to  learn  the  effects  of  varying  parame¬ 
ters  on  machine  performance  and  nor  absolute  machine  perfor¬ 
mance,  this  "rough”  measurement  technique  is  acceptable. 

The  operating  system  of  the  host  machine  allows  the  use 
cf  pre-defined  commands  and  queries  known  as  scripts  which 
has  eliminated  the  fluctuation  of  terminal  time. 
Additionally,  the  fluctuation  of  the  parse  time  has  been 
eliminated  by  using  pre-parsed  commands  stored  in  the  data¬ 
base.  However,  seme  fluctuation  is  introduced  by  the 
query-post  processor  which  formats  data  for  screen  display, 
but  this  is  not  significant  within  the  query  sets. 

The  initial  approach  to  defining  relevant  queries  has 
been  to  concentrate  on  the  repetition  of  certain  operations. 
Durinq  this  repetition,  given  factors  have  been  varied  to 
ascertain  effects  on  performance.  For  the  join  operations, 
tuple  sizes,  database  sizes,  index  structure,  disk  place¬ 
ment,  and  machine  configuration  have  been  varied. 

By  and  large,  the  query  streams  have  been  run  in  a  cont¬ 
rolled  environment.  To  offset  the  workload  variability  of 
the  host  machine,  runs  have  been  conducted  during  times  of 
minimal  activity  on  the  host.  Likewise,  use  of  the  database 
machine  has  been  restricted  to  a  single  user. 

C.  THE  JOIN  OPEHATICHS 

Several  groups  of  experiments  have  bean  conducted  during 
which  certain  parameters  have  been  varied  during  repetitions 
cf  the  same  experiment.  These  experiments  have  been 
designed  to  obtain  measurements  on  one  particular  aspect  of 
relational  database  management:  the  join  operations. 
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1 .  A  Formal  Definition  of  a  Join 

Simply  stated,  a  join  is  a  composition  of  t 
more  relations.  In  relational  algebra,  a  join  can  be 
expressed  as  follows:  the  e-join  of  column  x  of  table  R  and 
column  y  of  table  S  is  a  table  whose  rows  are  in  the 
Cartesian  Product  of  R  and  S,  such  that,  for  the  mathemat¬ 
ical  operator  9,  the  row  element  x  of  R  and  the  row  element 
y  of  S  held  true  for  6. 

2 .  The  Jo  in  in  the  Ben  chmarked  2il2£I  Language 

In  the  benchmarked  query  language,  RQL,  a  join  is 
accomplished  using  the  RETRIEVE,  RANGE,  and  qualifier  WHERE 
commands.  For  example: 
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Given  the  above  relations,  a  typical  join  query  in  RQL  could 
be: 

RANGE  of  P  is  Personnel 

RANGE  of  D  is  Department 

RETRIEVE  (P.  last  name  ,  D.  phene,  D.  office) 

WHERE  P.dept  =  D.name 


This  query  would  return: 
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D.  EQUALITY  JOINS 


1 .  The  Definition  and  Examples 

An  equality  join  is  one  in  which  0  is  defined  as  the 
mathematical  egualtiy  (i.e.  ,  =)  .  That  is,  the  statement 

following  the  qualifier  WHERE  in  RQL  contains  either  a 
singular  or  multiple  equalities.  For  example,  using  the 
relations  described  above,  the  following  retrievals  repre¬ 
sent  two  different  eguality  joins: 

RETR IE7E (P.  lastname, D .phone) 

WHERE  P.Dept  =  D.name  and  P.age  =  "25” 


or 


RETRIEVE (P.  lastname,  D. phone) 

WHERE  P.  dept  =  D.name  and  D.name  -  "OPS” 
and  P.age  =  "25" 

2.  The  Databases  Used 

Equality  joins  represent  the  vast  majority  of  exper¬ 
iments  conducted  during  this  research.  Equality  joins  have 
been  conducted  on  all  of  the  databases  listed  in  Table  II. 

3  •  Queries  Used 

Equality  joins  have  been  run  with  both  singular  and 
multiple  qualifications  (i.e.,  singular  or  multiple  equali¬ 
ties  in  the  WHERE  clause)  .  The  majority  of  the  joins  have 
been  conducted  on  singular  qualifications,  and  the  discus¬ 
sions  below  focus  primarily  on  those  experiments.  The 
mult iple-gualif icaticn  joins  will  be  discussed  separately. 
The  singularly-gualified  joins  are  eguated  on  the  KEY  field 
of  each  relation. 
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4 .  Results 


As  previously  explained,  the  methodology  emphasizes 
varying  parameters  throughout  the  repetition  of  the  same 
group  of  experiments.  The  queries  and  their  results  are 
presented  below,  and  they  are  grouped  by  the  parameter  that 
has  been  varied. 

a.  Variability  of  Relation  Size  and  Tuple  Size 

Figure  3.1  depicts  three  joins  of  relations 
whose  tuple  size  is  of  100  bytes.  The  first  equality  join 
involves  a  relation  of  500  tuples  and  another  relation  of 
1000  tuples.  The  second  equality  join  involves  a  relation 
of  2500  tuples  and  another  of  5000  tuples.  The  third 
equality  join  involves  a  relation  of  5000  tuples  and  another 
cf  10000  tuples.  It  is  clearly  evident  that  the  join  times 
increase  linearly  as  the  number  cf  tuples  being  joined 
increases  linearly. 

He  now  vary  the  tuple  size  for  all  three  rela¬ 
tions.  Thus,  we  benchmark  the  three  relations  whose  tuple 
size  is  cf  200  bytes.  This  is  depicted  in  Figure  3.2.  The 
benchmark  cf  the  relations  whose  tuple  size  is  of  1000  bytes 
is  depicted  in  Figure  3.3,  and  the  benchmark  cf  the  rela¬ 
tions  whose  tuple  size  is  of  2000  bytes  is  depicted  in 
Figure  3.4.  The  linearity  demonstrated  earlier  in  Figure 
3.1  is  again  evident  in  these  joins. 

Figure  3.5  is  a  compilation  of  Figures  3.1,  3.2, 
3.3,  and  3.4  in  which  the  slopes  (or  the  rates)  of  linearity 
may  be  compared.  It  is  important  to  note  that,  the  bigger 
the  tuple  size  there  is;  the  steeper  the  slope  will  be. 
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BENCHMARKED  RELATIONS  WITH  SMALL.  FIXED 
TUPLE  SIZE  OF  100  BYTES  -  DATABASE 
NPSTEST 


tuples  Joined 


Figure  3.1  Benchaarfced  Relations  -  Snail  Tuple  size 
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Figure  3.2  Benchmarked  Relations  -  Medium  Tuple  Size 
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BENCHMARKED  RELATIONS  WITH  LARGE.  FIXED 
TUPLE  SIZE  OF  1000  BYTES  -  DATABASE 
NPSTEST 
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Pigure  3.3  Bencbaar ked  Relations  -  Large  Tuple  Siz 
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RELATIVE  PERFORMANCE  CHANGES  IN  JOIN 
TIME  DUE  TO  CHANGES  IN  TUPLE  SIZE  - 


Pigure  3.5  Relative  Performance  -  Changes  in  Tuple  Size. 
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b.  Variability  of  Database  Size  in  Terms  cf  Number 

of  B  loc  k  s 

Figure  3.6  depicts  three  join  operations  over  a 
large  database  of  3  1350  blocks,  NPSTEST.  Additionally, 
represented  in  Figure  3.6  are  the  same  three  joins  over 
three  small  databases,  NPS4  of  150  blocks,  NPS5  of  750 
blocks,  and  NPS6  of  1500  blocks.  The  results  of  these 
queries  reveal  that  the  block  size  is  not  a  significant 
factor  in  join  time. 

c.  Variability  of  Database  Disk  Placement 

Every  database,  namely,  N  PS  1 ,  NPS  2,  NPS3 ,  NPS11, 
NPS12,  or  NPS  1 3  contains  only  two  relations.  NPS1,  NPS2, 
and  NBS3  have  been  created  on  the  same  disk.  NPS 1 1 ,  NPS 12 , 
and  NPS  1 3  have  been  created  separately,  each  of  which  occu¬ 
pies  two  disks.  Figure  3.7  depicts  the  time  fcr  joins  on 
NPS 1 ,  NPS2,  and  NFS3  versus  the  same  joins  conducted  on 
NPS 1 1 ,  NPS 1 2,  and  NPS13.  The  results  strongly  suggest  that 
database  disk  placement,  especially  for  relatively  small 
databases,  is  net  a  major  factor  in  join  time. 

d.  Variability  of  Index  Structure 

A  query  stream  has  been  run  on  NPS11,  NPS12,  and 
NPS 1 3 .  During  the  run  the  index  structure  on  the  relations 
in  the  databases  has  been  modified  from  clustered  to 
nonclustered  and  then  eliminated.  Figure  3.8  depicts  the 
join  times  in  each  situation.  From  the  results  obtained,  it 
can  be  reasonably  assumed  that  for  relations  of  this  size 
there  is  no  significant  difference  between  join  times  on 
clustered  and  nonclustered  indices.  However,  the  join  times 
for  those  relations  with  no  indices  have  exhibited  increases 
exponentially. 
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THE  IMPACT  OF  DATABASE  DISK  PLACE¬ 
MENTS  ON  JOIN  TIME  -  COMPARISON  OF 


Figure  3.7  The  Iipact  of  Disk  Placenents  on  Joins 
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e.  Variability  of  Machine  Configuration 

As  stated  earlier,  during  the  course  of  these 
experiments,  the  database  machine  being  benchmarked  has 
operated  under  three  different  cor.if  gur  aticns  :  1/2-megabyte 
cache  memory  without  a  database  accelerator,  2-megabyte 
cache  mattery  without  a  database  accelerator,  and  2-megabyte 
cache  memory  with  a  database  accelerator.  Joins  over  rela¬ 
tions  with  the  fixed  tuple  size  of  100  bytes  in  the  data¬ 
base,  NPSTEST,  have  been  conducted  on  all  three 
configurations.  The  comparative  results  of  these  joins  are 
depicted  in  Figure  3.9.  These  results  show  that  an  increase 
in  cache  memory  size  from  1/2  to  2  megabytes  improved  join 
time  by  a  factor  of  27 %  to  3 1  ‘t .  The  addition  cf  the  data¬ 
base  accelerator  to  the  2-megabyte  cache  improved  the  join 
time  by  a  factor  of  6%  to  1255  only.  These  results  would 
seam  to  clearly  indicate  that,  for  the  jcin  operation,  a 
larger  cache  memory  is  much  more  effective  than  the  addition 
of  a  database  accelerator. 

5 •  Selection  Experiments 

In  addition  tc  the  equality  joins  described  so  far, 
there  has  been  an  additional  qualification  designed  to 
select  cnly  a  certain  portion  of  the  joined  tuples  for 
display.  The  number  of  tuples  to  be  displayed  is  to  b=  5  % 
of  the  number  of  tuples  in  the  smaller  relation  of  the  two 
relations  ir.  each  jcin.  To  accomplish  this  objective  for 
the  jcin  cf  the  500-tuple  relation  and  the  1000-tuple  rela¬ 
tion,  the  additional  qualification  is  tc  impose  a  "<  25" 
restriction  cr.  the  KEY  attribute.  That  is,  the  relations 
have  bean  joined  on  ■‘■he  equality  of  the  KEY  field  in  each 
relation,  and  there  has  been  the  additional  gualifier  that 
those  tuples  to  be  displayed  must  have  a  KEY  value  that  is 
less  than  "25".  For  the  join  of  the  2500-tuple  relation  and 
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Figure  3.9  The  Iapact  of  Machine  Configurations  on  Joins 
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the  5000-tuple  relation  the  restriction 
the  join  of  the  5000-tupla  relation 
relation  the  restriction  is  "<  250". 


is  "<  125" 


and  for 


and  the  10000-tuple 


Figure  3.10  depicts  the  response  times  for  thes® 
join  selections.  Figure  3.  11  depicts  the  response  times  for 
the  same  joins  for  which  there  is  no  5X-selectior.  restric¬ 
tion.  A  comparison  of  the  results  of  each  join  reveals  that 
especially  for  the  join  of  the  larger  relations  the  differ¬ 
ence  in  response  time  is  proportionally  greater.  Th-se 
significant  differences  are  likely  due  to  at  least  two 
prevalent  factors.  First  of  all,  there  is  an  I/O  overhead 
that  undoubtedly  comprises  a  major  portion  of  the  differ¬ 
ence.  Secondly,  it  is  highly  probable  that  for  this  type  of 
join  the  select  operation  is  performed  first,  and  then  the 
actual  join  is  performed.  A  comparison  of  Figures  3.10  and 
3.11  would  support  this  hypctheis. 


6 •  Cther  Equality- Join  Experiments 


Figure  3.12  depicts  a  comparison  between  two  se+s  of 
three  joins  on  the  same  relations  with  nonclustered  indices. 
The  'first  set  requires  no  relations  to  be  sorted.  The 
second  set  requires  the  relations  to  be  sorted  on  an  attri¬ 
bute  ether  than  the  KEY  attribute  on  which  the  index  is 
based.  The  comparative  results  of  the  runs  for  these  joins 
are  close.  The  plotted  curves  for  the  response  times  cress 
tnemselves.  This  may  indicate  that  the  sorting  of  relations 
on  the  basis  of  a  non-key  attribute  does  not  improve  the 
join  time. 

Figure  3.13  depicts  a  comparison  between  two  sets  of 
the  same  three  joins  for  which  the  expression  of  the 
equality  predicate  has  been  reversed.  For  these  particular 
joins  the  reversal  of  the  expression  of  the  “quality  predi¬ 
cate  appears  to  be  insignificant  as  a  factor  in  join  time. 
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THREE  JOINS  WITH  SELECTION  LIMITED 
TO  5  %  OF  THE  SMALLER  RELATION  BEING 
JOINED 


tuples  Joined 


Figure  3.10  Three  5%-Join  Selections 
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Figure  3.12  Joins  on  Sorted  and  Onsorted  Helations 
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E.  INEQUALITY  JOINS 


A  limited  number  of  inequality  joins  has  been  conducted 
during  the  course  of  these  experiments. 

1 .  A  Definition 

For  these  experiments  an  inequality  join  is  one  in 
which  9  is  defined  as  a  mathematical  inequality.  That  is, 
the  statement  following  the  qualification  WHERE  in  RQL 
contains  either  '•!"  cr  "<"  or  This  qualification  has 

been  imposed  on  the  KEY  attribute. 

2  •  Experiments 

Inequalities  have  been  applied  to  the  join  of  a 
500-tuple  relation  and  a  1000-tuple  relation  and  to  the  join 
of  a  2500-tuple  relation  and  a  5000-tuple  relation. 

2 •  Disastrous  Results 

The  results  of  these  joins  have  proven  to  be  disast¬ 
rous.  For  even  the  smaller  join  of  the  500-tuple  relation 
and  the  1000-tuple  relation,  the  response  time  has  run  into 
hours.  Ihis  long  response  time  has  jeopardized  the 
integrity  of  the  experiments,  since  during  the  course  of  tha 
run  the  status  of  the  host  machine  has  experienced  signifi¬ 
cant  fluctuations  ir.  load  conditions.  Obviously,  it  may 
prove  the  point  that  the  inequality  joins  cannot  be 
supported  by  the  machine  with  any  reasonable  response  time. 

F.  THE  TBREE-i&Y  JOIN 

1 .  A  Definition  and  Ex  ample 

Fcr  these  experiments  a  three-way  join  is  simply  a 
composition  of  *:hree  relations  via  equality  joins.  The 
three  relations  have  been  joined  or*  *-he  equality  of  *-he  KEY 
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attribute  of  each  relation, 
and  C,  the  join  has  been 
and  B.KEY  =  C.  KEY. 


That  is, 
accomplished 


for  relations  A,  B, 
WHERE  A. KEY  =  B. KEY 


2  •  Experiments 

The  three  relations  that  have  beer,  joined  are  a 
500-tuple  relation,  a  1000-tuple  relation,  and  a  2500-tuple 
relation.  So  selection  restriction  has  been  imposed  or  the 
join . 

The  response  time  for  this  query  is  .8114  minutes. 
A  two-way  join,  under  similar  conditions,  cf  the  same 
500-tuple  and  1000-tuple  relations  has  been  accomplished  in 
.7011  minutes.  The  small  increase  of  the  response  time  from 
the  two-way  join  to  the  three-way  join  of  .1103  minutes 
(15. 7%)  weald  appear  to  further  demonstrate  the  significance 
of  the  ens-time  I/O  overhead  in  joins.  In  other  words, 
regardless  of  th®  number  of  ways  a  join  is  to  be  conducted, 
the  cne-tine  I/O  overhead  would  consume  a  substantial 
portion  of  the  join  time.  In  this  case,  the  overhead 
consumes  about  65%  of  the  three-way  join  time. 


G.  JOINS  VERSOS  VIEWS 

1  •  The  Vie w  in  the  Ben ch marked  Query  Language 

In  RQL  the  CREATE  VIEW  command  is  used  to  set  up  a 
virtual  relator,  which  is  composed  of  attributes  cf  one  or 
more  relations.  The  VIEW  is  not  physically  a  relation. 
Rather,  its  definition  is  stored  in  the  database.  The 
following  example  creates  a  new  virtual  relation,  LOCATOR: 

RANGE  of  P  is  Personnel 
RANGE  of  D  is  Department 

CREATE  VIEW  LOC ATOR ( P. n ame ,D. name ,D . of f ice, D. phone) 

WHERE  ?. dept  =  d.name 


2.  Experiments  on  Views 

Ibe  views  have  been  defined  and  stored  in  the  appro¬ 
priate  databases,  before  their  use  for  comparison  to  join 
operations.  For  both  the  views  and  the  joins,  projection 
has  been  limited  to  five  attributes,  but  no  restriction  has 
been  imposed  on  selection. 

Tte  views  have  been  created  from  a  500-tuple  rela¬ 
tions  and  a  1000-tuple  relation:  a  2500-tuple  relation  and  a 
5000-tuple  relation;  and  a  5000-tuple  relation  and  a 
10000-tuple  relation.  These  relations  exist  in  databases 
NPS 11,  UPS  1 2 ,  and  NPS13,  respectively.  Likewise,  the  joins 
have  teen  accomplished  on  these  same  relations  and  data¬ 
bases  . 

Figure  3.14  depicts  the  comparative  response  time 
for  each  of  the  three  situations.  The  remarkable  similarity 
in  response  times  between  views  and  joins  for  these  experi¬ 
ments  would  seem  to  point  out  that  the  views  are  r.c  more 
expensive  and  inefficient  to  use  than  the  joins.  In  certain 
situations,  however,  the  views  could  be  of  greater  value, 
since  they  require  very  little  disk  space  as  compared  to  the 
physical  space  needed  by  the  tuples  of  the  joins. 
Additionally,  the  view  appears  tc  provide  the  user  greater 
flexibility  for  contreliing  access  to  the  database. 
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COMPARISONS  OF  JOINS  VERSUS  VIEWS 
ON  THE  SAME  RELATIONS 


3000  4000  5 

tuples  Joined 


joins  Versus  Views 


■  1 

— 1 

1 

:  1 

*  | 

/ 

•  1 

/ 

* 

/ 

1 

/ 

/ 

; 

/ 

! 

_ 

1 

j 

i 

y/ 

| 

& 

? 

yf 

f 

/ 

/ 

/ 

s 

■  j 

■  i  — i  ■  . . 

h—  •  ■ 

_ 

«8 


11.  CONCLUDING  REMARKS 


i.  GENERAL  COMMENTS 

The  experiments  discussed  above  have  revealed  several 
interesting  results,  notably  the  consistent,  linearity  in 
join  times  and  the  apparent  significant  join  overhead 
undoubtedly  resulting  from  bus  contention.  Figure  4.1 
illustrates  both  of  these  c haractersistics . 

More  specifically.  Figure  4.1  depicts  the  total  join 
time  for  various  numbers  of  blocks  of  joined  data.  The 
inherent  overhead  is  clearly  evident  for  access  to  less  than 
1000  blocks  while  these  joins  involved  with  1000  or  mere 
blocks  clearly  demonstrate  the  consistent  linearity  as 
previously  discussed. 

As  also  previously  discussed,  the  GETTIME  function  in 
RQL  has  been  the  only  measurement  tool  employed.  Although 
re  hardware  or  software  probes  have  been  available,  the 
experiments  that  have  been  run  using  GETTIME  have  provided 
enough  information  so  that  seme  statement  concerning  the 
mean  cf  attainable  block  access  time  can  be  made.  Figure 
4.2  depicts  the  average  block  access  time  for  each  tuple 
template  and  the  effects  on  this  average  as  the  join  has 
teen  repeated  over  increasingly  larger  relations  (in  the 
number  of  tuples). 

In  Figure  4.2  it  is  evident  that  the  overhead  of  the 
initial  access  is  being  absorbed  as  the  size  of  the  rela¬ 
tions  being  joined  increases.  By  repeating  the  same  jcin 
for  increases  in  both  blocks  size  and  number  of  tuples 
accessed,  some  representative  mean  access  times  can  be 
ascertained.  That  is,  the  access  time  curves  will  approach 
some  asymptotic  lower  bound. 
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Figure  4.2  also  reveals  that  for  this  particular  data¬ 
base  machine  that  it  is  more  efficient  (or  profitable)  to 
perform  joins  on  larger  relations.  The  access  times  for  the 
smaller  relations  are  much  higher  than  the  access  times  for 
the  larger  relations.  As  the  size  of  the  relation 
increases,  the  mean  access  time  demonstrates  a  convergence 
to  a  representative  number.  This  number,  the  mean  access 
time,  can  be  considered  an  important  charact ersitic  of  this 
particular  benchmarking  experiment. 

B.  A  COMPARISON  OP  DIFFERENT  ACCELERATOR/CACHE 

CONFIGURATIONS 

This  benchmarking  experiment  has  not  been  designed  as  an 
analysis  of  several  differently  configured  RDM  1100s. 
However,  while  this  benchmarking  is  making  progress,  the 
availability  of  more  cache  and  the  database  accelerator  has 
stimulated  much  interest  in  the  performance  differences  for 
the  different  machine  configurations.  Therefore,  consider¬ 
able  time  has  been  expended  towards  accumulating  comparable 
data  for  each  of  the  three  configurations  on  which  experi¬ 
ments  have  been  run. 

In  Chapter  III  there  is  a  brief  discussion  of  the 
differences  in  join  times  for  the  relations  of  100-byte 
tuples.  The  following  discussion  focuses  cn  the  24  joins 
conducted  on  the  database,  NPSTEST,  for  each  of  the  ■‘■hree 
configurations. 

Table  III  summarizes  the  average  percentage  decrease  in 
join  time  for  each  join  as  the  amount  of  cache  is  increased 
from  1/2  megabyte  to  2  meagbytes.  Table  III  also  summarizes 
the  further  decrease  in  join  time  as  the  database  acceler¬ 
ator  is  added  to  the  2-megabyte  cache  configuration.  This 
summary  reveals  larger  decreases  in  the  join  time  as  the 
sizes  of  the  relations  being  joined  increase.  In  ether 
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TABLE  III 

Comparison  of  Joins  Conducted  on  Different  Machine 
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words,  as  the  initial  join  overhead  is  absorbed,  the  addi¬ 
tional  cache  increasingly  decreases  the  join  time  by  a 
percentage  cf  apprcximate ly  59%.  Correspondingly,  it 
appears  that  the  effects  of  adding  a  database  accelerator  to 
the  2-megabyte  cache  are  less  significant  for  the  larger 
relations,  although  in  all  cases  there  is  some  improvement. 

C.  THE  METHODOLOGY  AND  ITS  LIMITATIONS 

The  methodology  that  has  been  discussed  in  this  paper 
has  fundamentally  sound  origins,  and  the  experimental 
approach  of  varying  join  parameters  has  and  should  continue 
to  provide  relevant  information  from  which  insight  can  be 
drawn.  However,  as  discussed  above,  benchmarking  is  a  rela¬ 
tively  new  area  of  research  in  computer  science,  and 
certainly  the  techniques  that  have  been  applied  throughout 
the  course  of  these  experiments  can  be  improved  and  refined. 

I 

A  definitive  performance  pronouncement  on  the  RDM  1100 
has  net  been  the  ultimate  goal  due  to  the  use  of  the  GETTIM3 
function  of  RQL.  Despite  its  "coarseness"  in  getting 
performance  measurements,  the  G2TTIME  function  has  been 
deemed  accurate  enough  for  the  purposes  of  our  experiments. 

Actually,  this  function  has  been  considered  sufficiently 
accurate  in  view  of  the  lack  of  other  more  accurate  measure¬ 
ment  tools.  Probes  have  not  been  available,  and  software 
packages  for  performance  data  collection  have  been  delayed 
and  are  unavailable  for  these  experiments.  Future  attempts 
to  benchmark  such  a  system  should  utilize  additional  methods 
for  determining  relevant  performance  data. 

The  benchmarking  cf  the  RDM  1100  is  a  project  of  seem¬ 
ingly  low  priority  at  the  command  which  houses  the  hest 
(JNIVAC  system.  Existing  workloads  demand  vast  amount  of  the 
system's  resources,  and  in  reality  it  has  been  quite  diffi¬ 
cult  to  "control"  the  environment  in  which  these  experiments 
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have  bean  conducted.  Thus  the  load  conditions  of  the  host 
system  may  have  compromised  the  integrity  of  some  results. 
Additionally,  the  majority  of  the  experiments  have  beer, 
conducted  from  a  remote  terminal  which  has  probably  further 
degraded  the  experimental  results.  Obviously,  on  site, 
strictly  controlled  experimentation  is  the  ideal  practice 
for  benchmarking  experiments. 

Our  inability  to  control  the  host  environment  raises  yet 
another  issue.  The  goal  of  the  experiments  has  been  to 
collect  measurements  on  joins  for  which  certain  parameters 
are  varied.  However,  a  major  parameter  has  not  been  varied. 
That  parameter  is  the  load  condition  of  the  host.  As 
described  above,  attempts  have  bean  made  to  run  experiments 
at  times  cf  minimal  host  activity.  In  actual  practice,  the 
database  machine  is  likely  to  be  benchmarked  during  periods 
of  peak  host  activity.  Future  benchmarking  efforts  should 
fake  this  into  consideration,  and  attempts  should  be  made  to 
control  and  vary  host  load  conditions  as  part  of  the  mix  of 
query  scripts.  In  view  of  the  minimal  host  activity,  the 
results  we  have  obtained  may  be  considered  as  the  optimal 
performance  cf  the  RDM  1100  for  join  operations. 

As  the  deadline  fcr  submission  of  this  thesis  has  drawn 
near,  planned  experiments  have  been  cancelled  from  the 
testing  agenda.  A  "time  crunch"  has  resulted  from  a  variety 
cf  sources.  Primary  of  these  sources  has  been  the  contin¬ 
uing  requirements  to  correct  software  deficiencies  that  have 
been  identified  as  a  result  of  the  experiments  that  have 
been  conducted.  Likewise,  the  changing  of  the  database 
machine  configuration  has  also  severely  cut  into  the  time 
available  to  run  the  full  set  of  planned  experiments.  In 
essence,  although  a  great  deal  of  relevant  data  has  been 
collected,  th9  consistency  of  some  data  may  be  questionable 
since  a  limited  number  of  experiments  has  been  conducted  in 
each  area  of  experimentation. 
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Besides  these  limitations  and  deficiencies,  the  experi¬ 
ments  that  have  been  conducted  have  provided  enough  relevant 
information  from  which  valuable  conclusions  can  be  drawn. 
The  results  of  the  join  experiments  described  here,  when 
combined  with  those  results  of  selection  and  projection 
experiments,  comprise  a  substantial  starting  point  for  the 
comparison  of  similar  database  machine  architectures.  They 
provide  a  solid  framework  for  benchmarking  relational  data¬ 
base  machines. 
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